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Abstract 

Breakpoint graphs are ubiquitous structures in the field of genome rearrange- 
ments. Their cycle decomposition has proved useful in computing and bounding 
many measures of (dis) similarity between genomes, and studying the distribu- 
tion of those cycles is therefore critical to gaining insight on the distributions 
of the genomic distances that rely on it. We extend here the work initiated 

n 

by Doignon and Labarre who enumerated unsigned permutations whose 
breakpoint graph contains k cycles, to signed permutations, and prove explicit 
formulas for computing the expected value and the variance of the correspond- 
ing distributions, both in the unsigned case and in the signed case. We also 
show how our results can be used to derive simpler proofs of other previously 
known results. Finally, we compare the distribution of the number of cycles in 
breakpoint graphs of unsigned and signed permutations to the distributions of 
several well-studied genomic distances, emphasising the cases where approxima- 
tions obtained in this way stand out. 

Keywords: Genome rearrangements, Hultman numbers, Permutations 



1. Introduction 

The field of comparative genomics is concerned with quantifying similarity 
or divergence between organisms. Several measures have been proposed to that 
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end, including pattern matching based approaches or edit distances relying on 
a given set of biologically relevant operations. A standard example of such a 
method, and a de facto standard in phylogenetics, is the approach based on 
sequence alignment, which is motivated by the observation that genomes evolve 
by point mutations and aims at explaining evolution by replacements, insertions 
or deletions of single nucleotides (see e.g. Li and Homer [2| for a recent account 
of sequence alignment techniques and their uses). 

However, genomes also evolve by large-scale mutations that act on whole 
segments of the genome, as opposed to point mutations. Examples of such mu- 
tations include reversals, which reverse the order of elements along a segment, 
transpositions, which move segments to another location, and translocations, 
which exchange segments that belong to different chromosomes. Many models 
have been proposed for studying those genome rearrangements, which vary ac- 
cording to the kinds of mutations one wants to take into account, how these 
should be weighted, or which objects are best suited for representing genomes 
(see e.g. Fertin et al. Q for an extensive survey). Nonetheless, a striking similar- 
ity between all these models is how heavily they rely on variants of a graph first 
introduced by Bafna and Pevzner [4] , known as the breakpoint graph, and its de- 
composition into edge- or vertex-disjoint cycles, which has proved most useful in 
obtaining extremely tight bounds on many genome rearrangement distances, as 
well as formulas for computing the exact distance in several cases. The link be- 
tween several genomic distances and the number of cycles in breakpoint graphs 
will be discussed in more detail in Section 

Many mathematical questions arise when studying genome rearrangement 
distances, particularly concerning their distributions, as well as related statisti- 
cal parameters. Since quite a few such distances can be computed or approxi- 
mated using the cycle decomposition of the breakpoint graph, investigating the 
distribution of such cycles appears as a natural, general and effective starting 
point to answering those questions. We will restrict our attention in this paper to 
the permutation model, which can be used when all genomes under comparison 
consist of exactly the same genes (but in a different order) without duplications. 
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Breakpoint graphs can be associated to permutations, and the distribution of 
cycles in this case was first characterised by Doignon and Labarre which 

n 

later led Bona and Flynn [5j to prove a very simple expression for the expected 
value of the block-interchange distance originally introduced by Christie [6]. 

However, it has often been argued that signed permutations provide a more 
realistic model of evolution, since signs can be used to represent on which strand 
a given DNA segment is located. Using this model, Szekely and Yang [7[ ob- 
tained bounds for the expectation and the variance of the number of cycles in 
the breakpoint graph of a random signed permutation. Using the finite Markov 
chain embedding technique, Grusea [8| obtained the distribution of the number 
of cycles in the breakpoint graph of a random signed permutation in the form 
of a product of transition probability matrices of a certain finite Markov chain. 
Her method allows to derive recurrence formulas and to compute this distribu- 
tion numerically, but the computational complexity is quite high and limits the 
practical applications. 

In this work, we obtain a new expression for computing the number of un- 
signed permutations whose breakpoint graph contains a given number of cycles, 
as well as what is to the best of our knowledge the first analytic expression for 
computing the number of signed permutations whose breakpoint graph contains 
a given number of cycles. The formula obtained in the signed case is compli- 
cated, but we obtain simpler formulas for a couple of restricted cases. We also 
use our results to derive elementary proofs of previously known results, including 
a binomial identity and the distribution of the number of cycles in the break- 
point graph of an unsigned permutation. We prove formulas for computing the 
expected value and the variance of the distribution of those cycles, both in the 
unsigned case and in the signed case. Finally, we also discuss how the results we 
obtain relate to a number of widely-studied genome rearrangement distances, 
and in particular, how the distribution of cycles in breakpoint graphs can be 
used to approximate (and in some cases, to recover exactly) the distribution of 
those distances. 
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2. Notations and definitions 



We recall here a few notions that will be used throughout the paper. We 
assume the reader is familiar with graph theory (if not, see e.g. Diestel Q), but 
nevertheless review a few useful definitions, if only to agree on notation. We 
will work with non-simple graphs, i.e. graphs that may contain loops (edges 
connecting a vertex to itself) as well as parallel edges. We will also work with 
both undirected and directed graphs, using {u, v} to denote edges in the former 
case and (u,v) to denote arcs in the latter. 

Definition 2.1. A matching M in a graph G — (V,E) is a subset of pairwise 
vertex-disjoint edges of E. It is a perfect matching of U C V if every vertex in 
U is incident to an edge in M. 

Definition 2.2. A graph is k-regular if each of its vertices has degree k. 

In particular, if G is a 2-regular graph, then it decomposes in a unique way 
into a collection of edge- and vertex-disjoint cycles, up to the ordering of cycles 
and to rotations of elements within each cycle (i.e., (a, b, c, d) = (b,c,d,a)), 
as well as directions in which cycles are traversed if G is undirected (i.e., 
(a, b, c, d) = (d, c, 6, a)). This allows us to denote unambiguously c(G) the num- 
ber of cycles in G. The length of a cycle is the number of vertices it contains, 
and a k- cycle in G is a cycle of length k. 

Definition 2.3. A graph is hamiltonian if it contains a cycle visiting every 
vertex exactly once. 

We now recall a few basic notions about permutations (for more details, see 



e.g. Bjorncr and Brcnti [10] and Wielandt 



ui)- 



Definition 2.4. A permutation of {1, 2, . . . , n} is a bijective application of {1, 2, 
. . ., n} onto itself. 

The symmetric group S n is the set of all permutations of {1, 2, . . . , n}, to- 
gether with the usual function composition o, applied from right to left. We use 
lower case Greek letters to denote permutations, typically tt — {tti tt2 • • • 7r„), 
with TTi — n(i), and in particular write the identity permutation as i = (1 2 • • • n). 

Definition 2.5. The graph T(tt) of a permutation n G S n has vertex set 
{1,2,..., n}, and contains an arc whenever 7^ = j. 
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Definition implies that r(7r) is 2-regular and as such decomposes in a 
unique way into disjoint cycles (up to the ordering of cycles and to rotations of 
elements within each cycle) , which we refer to as the disjoint cycle decomposition 
of 7r. It is also common to refer to a permutation as a fc-cycle, if the only cycle 
of length greater than 1 that its graph contains has length k. Figure Q shows an 
example of such a decomposition. To lighten the presentation, we will shorten 
the notation c(T(7r)) into c(ir), for a given permutation n. 




Figure 1: The graph of the permutation n = (2 4135879 6). 

Definition 2.6. The conjugate of a permutation it by a permutation a, both 
in S n , is the permutation a o w o cr _1 , and can be obtained by replacing every 
element i in the disjoint cycle decomposition of it with Uj. 

Definition 2.7. A signed permutation is a permutation of {1, 2, . . . , n) where 
each element has an additional "+" or "— " sign. 

The hyperoctahedral group is the set of all signed permutations of n 
elements, together with the usual function composition o, applied from right to 
left. It is not mandatory for a signed permutation to have negative elements, so 
S n C Si^ since each permutation in S n can be viewed as a signed permutation 
without negative elements. To lighten the presentation, we will conform to the 
tradition of omitting "+" signs for positive elements. 

Finally, we recall the definition of the following graph introduced by Bafna 
and Pevzner which turned out to be an extremely useful tool for studying 
and solving genome rearrangement problems and which will be central to our 
discussions. 

Definition 2.8. Given a signed permutation n in S*^, transform it into an 
unsigned permutation n' in S2n by mapping Wi onto the sequence (2ni — 1, 2-7Ti) 
if ni > 0, or (2|7Tj|, 2|7Tj| — 1) if 7r.j < 0, for 1 < i < n. The breakpoint graph 
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of 7r is the undirected bicoloured graph BG(ir) with ordered vertex set (ir' Q — 

0, ttJ , 7T2, . . . , T^2m ^n+i = 2n+l) and whose edge set is the union of the following 
two perfect matchings of V(BG(ir)): 

• black edges S b {tt) = {{^i-^i+i) I < i < n}; 

• grey edges S G = {{n 2i , n' 2i + 1} | < % < n} = {{2i, 2i + 1} | < i < n}. 
We will often use the notation BG(ir) — <5b(7t)U<5<3 to denote breakpoint graphs. 

Genome rearrangement problems usually involve computing edit distances, 

1. e. the smallest number of moves needed to transform a genome into another 
one using only operations specified by a given set S. In the case of permuta- 
tions, those distances are usually left-invariant, which intuitively means that 
genes can be relabelled so that either genome becomes i without affecting the 
value of the distance to compute. Under this assumption, the pairwise genome 
rearrangement problem in can be viewed as a constrained sorting problem, 
and the intuition behind the breakpoint graph construction is that black edges 
are meant to represent the current situation (i.e. the ordering provided by 7r), 
while grey edges are meant to represent the target situation (i.e. the ordering 
provided by l). Figure 2 shows an example of a breakpoint graph. By definition, 
such a graph is a collection of even-length cycles that alternate black and grey 
edges. It can be easily seen that the example shown in Figure decomposes 
into two such cycles. 

The length of a cycle in a breakpoint graph differs from the traditional 
graph-theoretical definition that we mentioned on page 2) it is half the number 
of edges the cycle contains. Nevertheless, we will keep the terminology k-cycle 
to designate a cycle of length k, keeping in mind that its length is measured 
differently in the context of breakpoint graphs. 

3. Cycle statistics 

n 

As is well-known (see e.g. Graham et al. }12j|). the unsigned Stirling number 
of the first kind [?] counts the number of permutations in S n which decompose 
into k disjoint cycles: 



|{tt e S n | c(tt) = k}\. 
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1 7T 3 



Figure 2: The breakpoint graph of (-5 1 2 4 - 7 - 3 6). 

Recall also that those numbers arise as coefficients in the series expansion of the 
rising factorial 



x n = x(x + I ) • • • (x + n - 1 ) = 

k=0 

and of the falling factorial 

n 

x^ = x{x - I) ■ ■ ■ {x - n + l) = 



k=0 



(1) 



(2) 



Signing the elements of a permutation does not change its disjoint cycle decom- 
position, so the number of signed permutations that decompose into k disjoint 
cycles is 2™ [™] . We are interested in the following analogues of the Stirling 
number of the first kind, based on the cycle decomposition of the breakpoint 
graph. 

Definition 3.1. The Hultman number Sn{n,k) counts the number of permu- 
tations in S n whose breakpoint graph decomposes into k cycles: 

S H (n,k) = \{ireS n | c(BG(n)) = k}\. 

The signed Hultman number S^(n, k) counts the number of permutations in 
whose breakpoint graph decomposes into k cycles: 



S±(n,fc) = |{7reS± I c(BG(w)) = k}\. 
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It is clear from Definition |2.8| that the number of cycles in any breakpoint 
graph is at least one and at most n + 1. Hultman numbers were so named 
by Doignon and Labarre [l| after Axel Hultman, who first raised the question 
of computing those numbers The authors obtained an explicit but com- 

plicated formula for computing Sn(n,k), as well as formulas for enumerating 
permutations with a given "Hultman class" (the analogue of conjugacy classes 
of S„ based on the breakpoint graph). Bona and Flynn Q later observed that 
they can be computed using the following much simpler expression: 

S H (n,k) = l (3) 
I otherwise, 

based on a formula first obtained by Kwak and Lee [ijj]. 

In the next section, we present another way of obtaining an explicit formula 
for the unsigned Hultman numbers, which we will use in Section [?] to derive 
a new and simple proof of Equation (0). In Section we will prove the first 
explicit formula for computing the signed Hultman numbers. 

4. A new formula for <Sjj(n, k) 

We will need the following results obtained by Hanlon et al. [lj . whose 
notation we follow. For any fixed n in No, let 

Q c n {h,l) = E(Rc(tr((l/l/ t )"))), 

where V is a random h x I matrix with independent standard complex normal 
entries, E denotes expectation, Re denotes real part, tr denotes trace and 4 de- 
notes matrix transposition. For the definition and the properties of the complex 
normal distribution, see for example Goodman (l^ . 

Hanlon et al. |l5j give two formulas for computing Q^(h,£), both of which 



we will need. The hrst formula 1 ] is 



Qfc(h,£)= h'* u W uou ™\ (4) 



1 See Corollary 2.4 p. 158 of Hanlon et al. [IE 
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where uif n \ is a fixed n-cycle in S n . The second formulEo is: 

The link between the Hultman numbers and the previous results of Hanlon 

et al. is obtained using the following result of Doignon and Labarre [lj . 

Corollary 4.1. 0/ Su{n, k) counts the number of factorisations of a fixed (n + 
l)-cycle (3 into the product pocu , where p is an (n + l)-cycle andu a permutation 
in S n +i with c(lu) — k. 

For a polynomial P{x), let [x k ]P(x) denote the coefficient of the monomial 
x k in P(x). We derive the following new expression for computing Su{n, k). 
Theorem 4.1. For all n in No, for all k in {1 , 2, . . . , n + 1 }: 

, n+l 

S H (n,k) = — + „_i + i)2±I (6) 



Proof. By Corollary l4.ll Sn{n, k) counts the number of factorisations of a fixed 
(n + I )-cycle f3 into the product p o cj, with c(p) = f and c(w) = fc. This is 
clearly equivalent to enumerating factorisations of p^ 1 into the product w o /3 _1 
under the same conditions; therefore, setting W( n+ i) to in Equation Q), we 
observe that (n, k) is the coefficient of the monomial hi in the polynomial 
Q^ +1 (h,£), hence by Equation (0) equals: 

1 1 w-i [/»*](/* + n - » + x [£](£ + n - i + 1)^ 

Since for every i in {1, 2, . . . , k + 1} we have 

= + n - i + f ){£ + n - i) ■ ■ ■ (£ + \)£{£ - 1)[£ - 2) ■■■(£- (i - 1)) 

= (-ly-^n-t + iy.ii- 1)!, 

the above summation simplifies to the wanted expression, which completes the 
proof. □ 

Besides providing a new relation involving Hultman numbers, our new for- 
mula will prove useful in obtaining simple proofs of known results, as we will 
see in SectionsQ andQ. Moreover, we think that the interest of our formula also 
lies in the fact that the method used to prove it extends to the signed case. 



2 See Theorem 2.5 p. 158 of Hanlon et al. [IE 
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5. An explicit formula for S^(n,k) 

We now turn our attention to the problem of computing signed Hultman 
numbers, which we solve using ideas similar to those presented in the previous 
section. The result is obtained by characterisi ng th e 2-regular graphs that cor- 



respond to actual breakpoint graphs (Lemma l5.ll page 1121) , and then relating 
that characterisation to an enumeration result by Hanlon et al. [15I ] . 

5. 1 . Preliminaries 

Following Hanlon et al. [15j . for some fixed n in No, let 

Ql(h,£)=E(tr((VV i ) n )), 

where V is again a random h x £ matrix, but this time with independent standard 



real normal entries. Hanlon et al. [15jj obtain two formulas for Q^(h,l). 

Let T n denote the set of perfect matchings of {0, 1, 2, . . . , 2n — 1}. In par- 
ticular, let e £ T n be the identity perfect matching {{i, n + i} | < i < n — 1}. 
The first formula^ for Q%(h,£) is: 

Ql(h,£) = h< eU6 H c{ - 5u5 ^\ (7) 

where 5/ n \ is a fixed perfect matching such that e U 5/ n -\ is hamiltonian. 

The second formula is based on partitions rather than on perfect matchings. 

Definition 5.1. [17| A (integer) partition A = (Ai, A2, • • • , A;) is a finite se- 
quence of integers called parts such that Ai > A2 > • • • > A; > 0. Its length 
is the number of non-zero parts it contains, and if X)!:=i ~ n i we can ^ a 
partition of n, which we write as A h n. 

We consider any two partitions to be equivalent if we obtain the same se- 
quence when removing all parts that equal 0. The notation A = 2 Tri2 . . . 
r mr ) is also frequently used, and expresses the fact that exactly m.j parts of A 
equal i. The reader must therefore bear in mind that when working with parti- 
tions, the notation a b is more often to be understood in the previous meaning, 
and not as "a to the power 6" . 



See Corollary 3.6 of Hanlon et al. [IE 
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The second formulclj for Ql(h,£) is: 

Ql(h,i)=J2^)Fx(h)F x (£), (8) 

A 

where: 

• A ranges over all partitions of n of the form (a,b,l n ~ a ~ b ), with either 
a >b >1 or a = n and b = 0, 

• the function Fa : R — > R is defined as: 

F\(x) = 2 a - b {x/2 + a - Yf=±{x + 2b - 2)^±^, (9) 

• and the coefficients c\(2) are given as follows: 

(_l)n+a- b +l 2 a-6+l n(2a _ 2b + 1)(Q _ l)| 
CA(i) = 5 o , (10) 

(n + a - b + lf{n - a + bf{n - a - b)\{2a - 1)!(6 - 1)! 
if A = (a, b, i"- a - b ) ; with a > 6 > 1, and 

2 T1 T? I 

c A (2) = — if A=(n). (11) 
(2n)! 

The numbers ca (2) appear as coefficients in the expansion of the n th power- 
sum function in terms of zonal polynomials. For definitions and details, see for 
example Macdonald [nj]. 

5.2. Characterising valid breakpoint graphs 

Recall that a breakpoint graph is a 2-regular graph that is the union of two 
perfect matchings of {0, 1, . . . , 2n + 1}. We now make the connection between 
signed Hultman numbers and the previously mentioned results explicit. 

Definition 5.2. A configuration is the union of two perfect matchings 5b and 
S G of {0, 1, . . ., 2n + 1}, where S G = {{2i, 2% + 1} | < i < n}. 



Note that the above definition only slightly generalises Definition 



2J, by 



allowing any choice of a perfect matching for 5b, whereas there are implicit 



4 See Theorem 5.4 of Hanlon et al. ul 
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constraints on the choice of 5 b in the definition of the breakpoint graph. By 
definition, every breakpoint graph is a configuration, but not every configuration 
is a breakpoint graph, as we will see below shortly. The following notion will 
help us characterise configurations that are breakpoint graphs. 

Definition 5.3. The complement of a configuration C = 5b U 5q, denoted by 
C = 5b U 5q, is obtained by replacing 5q with 5q = {{2i — l,2i} | 1 < i < 
n}U{{0,2n + l}}. 

Before stating our characterisation of breakpoint graphs, we wish to stress 
that Elias and Hartman [rsj l previously used a similar but different notion of 
complementation (they replace 5b with 5b - whose definition we will omit here 
- whereas we replace 5g with 5q) to characterise valid breakpoint graphs of 
unsigned permutations. This is not enough for our purpose, which is why we 
generalise their result below to encompass signed permutations as well. 

Lemma 5.1. A configuration 5b U 5g is the breakpoint graph of some signed 
permutation n if and only if the complement configuration Sb^JSg is hamiltonian. 



Proof. We can easily see that the complement BG(tt) of a breakpoint graph is 
hamiltonian, since its edges are 7r- +1 } < i < 2n} U {{0, 2n + 1}}. 

Reciprocally, if the complement 5b U 5q of a configuration is hamiltonian, 
then we can recover the elements of an unsigned permutation 7r' = (0 ir' 2 ■ ■ ■ 
Tr' 2n 2n + 1) by visiting the vertices along the hamiltonian cycle as follows: take 
= 7Tq as starting point, and follow the edge in 5b that is incident to 0, setting 
the value of ir^ to the other endpoint of that edge. We then keep following the 
cycle, assigning the label of the i th encountered vertex to as we go, ending with 
2n + 1 = n' 2n+1 . Note that for every < i < n, the edge {^i+u 7r 2i+2l belongs 
to Sq, and therefore we have l^i+i ~ 7r 2i+2\ = From the unsigned permutation 
7r', we can therefore easily recover the corresponding signed permutation ir in 
S^;, whose breakpoint graph is 5b U 5g- □ 

Figure [3(a) shows the complement of the breakpoint graph of Figure Q 
(page [7]), which is hamiltonian. On the other hand, the complement of the 
configuration shown in Figure 3(6) is not hamiltonian. We now show that Equa- 
tion |3) remains valid when replacing the identity perfect matching e with the 
perfect matching 5g and choosing 5g as the fixed perfect matching <5(„+i), which 
clearly satisfies the condition that 5q U 5q is hamiltonian as required. The proof 
can be easily generalised to any choice of a perfect matching Tt n +\) such that 
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Sq Ut(„ +1 ) is hamiltonian, but the following statement will be sufficient for our 
purposes. 

Lemma 5.2. For any n in Nq: 



(12) 



Proof. First, let us note that every perfect matching <f> in J- n +i can be seen 
as a fixed-point-free involution, i.e. a permutation of {0, 1, 2, . . . , In + 1} that 
decomposes into a collection of 2-cycles only, by viewing each edge of as a 
2-cycle. Therefore, conjugating cj> by any permutation of the same number of 
elements is a well-defined operation that simply renames the endpoints of the 
given edges. Let p, be the permutation defined by 



pi : {0, 1, . . . , In + 1} {0, 1, . . . , 2n + 1} : i v+ fx(i) 



i/2 if i is even, 
t+2n+i otherwise. 



As the example in Figure shows, 5 G can be mapped onto e = /i o 8q o pr 1 , 
and we fix <5( n +i) = /i o 5 G o /x . Finally, observe that given any two perfect 
matchings <p\ and 02 in J- n +i, the graphs /x o <pi o /x -1 U /x o 2 ° /x _1 and <f>i U 02 
are isomorphic, and hence c(p o 0i o U /i o </> 2 ° = c (0i U 02)- Taking 
5 = /i o r o /x , the following relations hold: 

• c(e U 5) = c(/x o 5g o /i" 1 Upro /x -1 ) = c(# G U t), 

• c((5 U <5( n+ i)) =c(poro /i^ 1 U/ioi G o = c(r U <5 G ), 

• c(e U <5(„+i)) = c(/x o <5 G o /x -1 u /x o <S G o /x" 1 ) = c(d> G U <5 G ) = 1, 

and the formula in the statement follows from the above relations, the bijectivity 
of conjugation, and Equation (@). □ 



So 





Hn+l) 



Figure 4: Mapping 5q (resp. 5q) onto e (resp. 5(^+1) ) by conjugating them by n 
(0 5 1 6 2 7 3 8 4 9). 
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5.3. Enumerating breakpoint graphs with k cycles 
Lemma 5.1 



implies that enumerating signed permutations of n elements 
whose breakpoint graph decomposes into k alternating cycles is equivalent to 
enumerating perfect matchings r i n J- n +i verifying c(5gUt) = k and c(tUSg) 

2. j page [Bland 5g is defined in Definition 



1, where Sg is defined in Definition 



5.3 



page[T2] Using Lemma |5.2| . we thus obtain the following. 



Remark 5.1. For every k in {1, 2, . . .,n+ 1}, S^(n, k) is the coefficient of the 
monomial h k £ in Q^ +1 (h 7 £). 

The second expression for Q^ +1 (h,£) given in Equation (Q) allows us to 
obtain the following explicit formula for S^(n, k). 
Theorem 5.1. For all n in No, for all k in {1, 2, . . . , n + 1}: 

Sfj(n,k) = $> A (2) x [h k ]F x (h) 
x 

(-l) n - a - 6 2°- 6 - 1 (26)!(a- l)!(n-a-6 + 2)! 
X (26-1)6! ' (13) 

where A ranges over all partitions of n + 1 of the form (a, 6, \ n - a - b + 1 f with 
a > b > 1 or a = n + 1, b = 0, and where the function F\(-) a s wel l as the 
coefficients c\(2) follow the definitions previously given in Section [j. J 5 ! . 



Proof. Remark l5.ll and Equation ([81) yield 

S|(n, k) = c a(2) x [h k }F x (h) x [£}F X (£), (14) 

A 

where the sum over A, the coefficients c\(2) and the function F\(-) are as in the 
statement of the present result. For a partition A of the form (a, b, i"- a - b + 1 ) ; 
with a > b > 1 or a = n + 1, b = 0, it is easy to see that 

_ (-l)"-°-"2°- 6 - 1 (2b)!( g - l)!(n - a - 6 + 2)! 
WaW - (26-1)6! ■ (15j 

Indeed: 

1. if A = (a, 6, in-a-b+l^ with a > 5 > i ; wc have 

= 2 Q - b (^/2 + a-l)(£/2 + a-2)---(^/2 + 6) 
x (^ + 26-2)(£ + 26-3)---^ + l) 
x £(£- !)■■■(£- (n- a -b + 2)). 



5 With the slight modification that n needs to be replaced with n + 1. 
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The coefficient of £ in the above expression equals 

[£}F X (£) = 2 a -» |°~ x (2b-2)!(-l)"- a - b+2 (n-a-b + 2)! 

(-l) n - a - b 2 a - b - 1 ( 2 &)!(a - - a - 6 + 2)! 

(26-1)6! ' 

2. if A = (n + 1), i.e. a = n + 1 and 6 = 0, we have 

F (n+1) (£) = 2 n+1 (£/2 + n)^(£-2f 

= 2 n+1 (£/2 + n){£/2 + n -!)■■■ {1/2 + 1)1/2, 



so [£]F/ n+1 \{£) = 2 n n\, which verifies Equation (|15T ) 



The proof then follows from Equations 0) and (0). □ 

We conclude this section with Table Q, which shows a few experimental 
values of the signed Hultman numbers. These values were previously obtained 
by the first author using the method described in a previous paper of hers Q . 

Note that for k = 1, the sequence defined by S^(n, 1) for n = 1,2,... 
corresponds to sequence A001171 in the On-Line Encyclopedia of Integer Se- 
quences |l9j. As we will see in the next section, other known sequences also 
appear in that table. 

6. Special cases 



The expression obtained in Theorem 



5.1 



allows us to compute Si(n, k) for all 
valid values of n and k, but we must acknowledge that even though the formula 
is suited for practical use, it is unfortunately quite complicated and difficult to 
manipulate. Simpler expressions do however exis t fo r some particular cases, as 
we will show below. We will rely a lot on Lemma 



5.1 



in this section, and decide 



to use a slightly different layout for the breakpoint graph: labels are omitted 
for clarity, and grey edges rather than black edges are now laid out on a circle, 
so that computing the complement of a given configuration simply amounts to 
shifting grey edges sideways by one position. In order to make verifications 
easier for the reader, we also draw edges in the complement as dotted edges. 
The following particular cases are easy to verify: 
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2 


3 


4 
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6 


7 


8 
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10 


11 


12 


1 
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1 






















2 
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3 


1 
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20 


21 


6 


1 


















4 




148 


160 


65 


10 


1 
















5 




1 348 


1 620 


701 


155 


15 


1 














6 




15 104 


19 068 


9 324 


2 247 


315 


21 


1 












7 




198 144 


264 420 


138 016 


38 029 


5 908 


574 


28 


1 










8 




2 998 656 


4 166 880 


2 325 740 


692 088 


124 029 


13 524 


966 


36 


1 








9 




51 290 496 


74 Oil 488 


43 448 940 


13 945 700 


2 723 469 


344 961 


27 930 


1 530 


45 


1 






10 




979 732 224 


1 459 381 440 


897 020 784 


305 142 068 


64 711 856 


8 996 295 


850 905 


53 262 


2 310 


55 


1 




11 




20661 458 688 


31 674 232 128 


20 241 273 264 


7255 047 116 


1 640 552 028 


249 029 717 


26 004 330 


1 910 403 


95 304 


3 355 


66 


1 



1. Sft(n, k) — for all k < 1 and all k > n + 1 (trivial); 

2. Sfl(n,n + 1) = 1, since the only permutation whose breakpoint graph 
decomposes into n + 1 cycles is t; 

3. Sf^{n,n) = ("J 1 ), since enumerating such permutations comes down to 
counting breakpoint graphs whose cycles all have length 1, except for one 
that has length 2. This in turn is equivalent to enumerating the ways in 
which one can connect any two of the n + 1 grey edges by bla ck edges so 



as to obtain a valid configuration (with respect to Lemma l5.il ); as can be 
verified on Figure 0, only one of the two possible choices of black edges 
(namely, configuration (&)) is valid, and the equality follows from the fact 
that there are ( n ^ 1 ) possible ways to select two grey edges out of n + 1. 



r 



Figure 5: The two forms of 2-cycles that may arise in a breakpoint graph. Only four 1-cycles 
are shown in each graph, but there can be any number of them. 



We now show how one can obtain a simple and explicit formula for S^(n, n — 

1). Although the formula is quite simple, we hope that the proof will convince 

the reader of the shortcomings of a case analysis in this setting. 

Proposition 6.1. For all n> I, we have S%{n, n - 1) = 5 ("I 1 ) + 4("+ 1 ) . 

Proof. Note that S H {n, n — 1) is the number of permutations whose breakpoint 
graph contains either one 3-cycle or two 2-cycles, all other cycles having length 
1 in both cases: 

1. the number of permutations satisfying the first condition is the number of 
ways to connect three grey edges in the breakpoint graph in su ch a way 



that the complement configuration is hamiltonian (see Lemma l5.lh . As 
Figure^ shows, there are eight possible ways to create such a configuration, 
only four of which are valid (namely, configurations (a), (b), (c) and (d)). 
The reader can easily verify that the other configurations are invalid by 
replacing grey edges with dotted edges. 

We obtain the rightmost term in the wanted expression by noting that 
only four of the eight possible 3-cycles are valid, and there are ("J 1 ) ways 
to select three grey edges out of n+ 1. 
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(«) 




•a^o' 'a^o' '-a^o' 'o^d 

M (/) (a) (M 



Figure 6: All possible forms of 3-cycles that may arise in a breakpoint graph. Only three 
1-cycles are shown in each graph, but there can be any number of them. 



2. the number of permutations satisfying the second condition can be con- 
structed by choosing four grey edges, then connecting them by pairs while 
ensuring that the resulting configuration is valid. Figure shows all pos- 
sible configurations with two cycles of length two. 

The reader can again easily verify the validity of all configurations by 
replacing grey edges with dotted edges. Only five possible configurations 
with two 2-cycles are valid (namely, configurations (6), (/), (£), (k) and 
(I)) out of the twelve shown in Figured, and there are ( n t ways to select 
two pairs of grey edges out of n + 1 , which yields the leftmost term in the 
wanted expression and completes the proof. 

□ 



7. Simpler proofs of previous results 



Theorem 



4.1 



allows us to obtain a new proof of 
(Equation H) page©. 
Corollary 7.1. 0/ For all n in Nq: 



Bona and Flvnnl 's formula 



S H (n,k) = ] Lfc 



n+2 ]/("+ 2 ) ifn-kisodd, 



otherwise 



Proof. The key idea of the proof is the fact that, for every i = 1, 2, . . . , n + 1, 
we have 



(h + n-i + lp± = 



- i + l) n+2 -Qi-i) 



n+2 



(16) 



19 



to) 



0=0 






id) 




Figure 7: All possible pairs of 2-cycles that may arise in a breakpoint graph. Only four 
1-cycles are shown in each graph, but there can be any number of them. 



since 



1 


n 


+ 


2 




1 




n 


+ 


2 




1 




n 


+ 


2 



(h-i + If + 2 -(h- i) n+2 



((h - i + 1) • • • (h + n - i + 2) - (h - i) ■ ■ ■ (h + n - i + 1)) 
(h-i + l)---(h + n-i + l)((h + n-i + 2)-(h-i)) 



= (h + n-i+ 



\n+l 



Summing over i in Equation (flih . we obtain 
— ^(Hn-i + 1) 

n+l 

kh - 1 + i) n+2 - (h - i) n+2 



(=i 
i 



(n - 


f l)(n- 


-2) 




1 




(n - 


f l)(n- 


-2) 




1 




(n - 


f l)(n- 


^2) 



(V l+2 -(h-n-l) 

Un+2_ h n±2 



By Equations (0) and (@), the coefficient of h k in /i™+ 2 is [ n + 2 ] and the 
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coefficient of h k in h?^- is (—1)" k [™fe 2 ] ■ Using Equation (0), we conclude that 
S H (n,k) = | ( («+iK«+2) 



drs+aTP 1 * 2 ] if™- ^ odd, 

otherwise, 



which completes the proof. □ 



Theorem 



4.1 



also allows us to obtain a simple proof of a binomial identity 
previously obtained by Sury et al. j^ . 
Corollary 7.2. fM] For all n in N : 

(?) ( ( j j -+ 2 ' 

Proof. Setting k to 1 in Equation (@) (page [9]) yields 

i=l i=0 Vt/ 

On the other hand, as previously observed^] by Doignon and Labarre [lj], we 
have: 

if n is even, 



S H {n,l) = \ «+2 . 

[0 otherwise, 

which completes the proof. □ 

8. Expected value and variance of the Hultman numbers 

In order to gain more insight into the distribution of the Hultman numbers, 
we will now investigate the question of computing the expected value and vari- 
ance of the number of cycles in breakpoint graphs, both for unsigned and for 
signed permutations. 

It will also be interesting to see how these values compare to the expected 
value and variance of the number of cycles in the usual disjoint cycle decompo- 
sition of a uniform random unsigned permutation 7r in S n . We recall here (see 
e.g. Wilf [2l|) the exact values of these quantities: 



E( C (tt)) = H n , 

'" 1 

Var(cW) = H„-J2t2 



k 2 ' 

fc=i 



6 The result can also be easily derived from Equation Q). 
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as well as their asymptotic behaviour when n — > oo: 



E( C (tt)) 
Var(c(7r)) 



log(n) + 7 + 0(1), 

log(ra) + 7 - — -+ 
6 



(17) 
(18) 



where H n denotes the n th harmonic number H n = 5Z i=1 7 and 7 denotes the 
Euler-Mascheroni constant. As usual, o(l) denotes a quantity that converges to 
as n — >• 00. 

<§. L TTie unsigned case 

Bona and Flynn [5] already proved a formula for computing the expected 
number of cycles in the breakpoint graph of a uniform random unsigned per- 
mutation. In this section we provide a new proof of their result and also give 
an explicit formula for the variance of this distribution. We start by computing 
the generating function of the Hultman numbers. 
Lemma 8.1. For all n € No, we have: 

n+l 



F{x) = ^2s H (n,k)x k 



k=0 Z \ 2 ) 

Proof. The derivation is straightforward: 

k=0 V 2 / fe=0 



(by Equation (0)) 



1 



/n+2\ 



/n+2 ,- 

E 

u=o 



^.n+2 _ jj.n+2 

2(T) ' 



n + 2' 

fc 



u+2 



fe=0 



n + 2' 

fe 



fe _ V^_]^n+2— fc 

(by Equations (0) and (@)) 



□ 



Knowing the generating function allows us to easily derive the expected value 
and the variance of the number of cycles in the breakpoint graph of a uniform 
random unsigned permutation. For this purpose, we first need to compute some 
derivatives of the generating function. 
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Lemma 8.2. For all n e No, we have: 
F(l) = n\, 

F'{1) = ^p 5 y{(n + 2)!ff„ +2 + (-ir- 1 n!}, 

^"(1) = ^ {(" + 2)1 " E ]?) + - 1)} ■ 

Proof. We obtain the three expressions separately. 

1. For the first expression, note that, by definition, F(l) = J2k=i ^H(n,k), 
which is simply the total number of permutations of n elements and there- 
fore equals n\. 

2. We simplify the computation of F'(x) by writing rr^i 2 - = (x-l)g(x), with 

n+l 

g(x) = x Y[(x -i). 
With this notation we have 

_ x^-(x-l)g{x) 
HX) 2("+ 2 ) 

We thus obtain 

= ^Uy " - 9( ' T) " (x ~ 1)5 ' (x) ) • 

At a; = 1 we have 1™+ 2 = (n + 2)! and 5(1) = (— l) n n!, and hence the 
stated formula for F'(l) follows. 

3. Finally, the second derivative of F is given by 

V 2 / y 0<J#j<n+l V )\ J> 

The above sum evaluated at x = 1 equals 

71+1 ^ 71+1 

^ (1 + i)(l + J) ~ ^ (1 + ijfl Tj) ~ ^ (1 + i) 2 

0<7^i<"+l »,j=0 V M J/ i=0 v ; 

(n+l \ 2 Tl+l 

71+2 

fc=l " 
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We also have 
and thus 



'{x)=g{x) f^ + J^J-), 



(!) fi-E^i) =(-l)"n!(l-ffn). 



</(!) = <? 



Using these expressions in the formula for F"(x) above, evaluated at x = 1, 
gives the formula in the statement. 

□ 

The recovery of the expected value of the Hultman numbers, previously 
obtained by Bona and Flynn jgj], is now an easy task. 

Theorem 8.1. 0/ For all n £ No, the expected number of cycles in the break- 
point graph of a uniform random unsigned permutation n of n elements is 

E(c(BG(n))) = H n ' 



L(n + 2)/2j' 



Proof. As is well-known (see e.g. Wilf |2l|), the expected value can be obtained 



from the generating function F(x) by the for mul a F 1 (\) / F(l). Using the for- 



mulas for F{1) and -F"(l) obtained in Lemma |8.2| . we obtain that the expected 
value of the Hultman numbers equals 

F'{1) „ , (-1)"" 1 
= n n+ 2 



F{1) (n + l)(n + 2)' 

which is easily seen to be equivalent to the expression in the statement. □ 

Furthermore, knowing the generating function also allows us to compute the 
variance of the Hultman numbers. We prove the following result. 

Theorem 8.2. For all n 6 No, the variance of the number of cycles in the 
breakpoint graph of a uniform random unsigned permutation 7r of n elements is 

v, I (uri \\\ tt 1 i (-l)"(2gn+2 + 2g» - 3) 1 
VarWSGW)) = i/„ +2 -g p+ (n + i) (n + 2 ) ( (n + l)(n + W ' 

Proof. The variance can be obtained from the generating function F(x) by the 
following formula (see e.g. Wilf plj): 
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Using the formulas for F(l), F'(l) and F"(l) obtained in Lemma 18.2 
obtain that the variance of the Hultman numbers equals 



F'(l) F"(l) fF'(l) 



F(l) 



F(l) 

(_l)n-l 



n+2 



-ffn+2 + 



(n + l)(n + 2) 



H 



2 \ " ' 

«+2 p 



fe=l 



(n+l)(n + 2) 



n+2 



2(-l)"(g n - 1) 
(n + l)(n + 2) 



fc=l 



A- 2 



(-l)"(2ff n+2 + 2H n - 3) 
(ra + l)(n + 2) 



((n + l)(n + 2)) 



2 ' 



□ 



It is interesting to see how the mean and variance behave for large n. 

Remark 8.1. The expected value and variance of the number of cycles in the 
breakpoint graph of a uniform random unsigned permutation tt in S n have the 
following asymptotical behaviour when n — > oo: 

E(c(BG(n))) = log(n) +7 + o(l), 

Var( C (BG(7r))) = log(n) + 7 - \ + o(l). 

6 

Proof. For the expected value, the result simply follows from the fact that 
E(c(BG(tt))) = H n + o(l) and H„ = log(n) + 7 + o(l). 

For the variance, first note that Var(c(i?G(7r))) = i/„+2 — Sfc=i p' + 
By further using the fact that log(n + 2) = log(n) + o(l) and the well-known 
result Ylk=l W ~ TP t Qe stated asymptotic formula follows. □ 

Interestingly, we recover exactly the same asymptotical behaviour as for the 
number of cycles in the usual disjoint cycle decomposition (recall Equations (|17p 
and (Jig)). 

8.2. The signed case 

We now turn to the problem of computing the expected value and the vari- 
ance of the signed Hultman numbers. As in the unsigned case, we start with 
the computation of the generating function for the signed Hultman numbers. 
Lemma 8.3. We have 

n+l 

G{x) = ]T S±(n, k)x k = J2 c x (2)F x (x)F x (0), 

k=l A 
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where A is subject to the same restrictions as in Theorem \5. A vaae \15\ and F\ is 
defined as in Equation (Q) vaae \ll[ 



Proof. Recall (Remark l5.ll page IT51) that Sg(n, k) is the coefficient of the mono- 
mial h k £ in the polynomial Q^ +1 (h,£). If we take now h = x and consider 
Q^ +1 (x,£) as a polynomial only in the variable £, we note that the coefficient 
of the monomial £ is obtained by summing up all the terms Sf I {n 1 k)x k , for 
k = 1, . . . , n + 1. Therefore, G{x) equals the coefficient of £ in <5* +1 (a;, £), and 
hence 

G{x)=?-Ql +1 (x,l) . 
The formula in the statement easily follows from Equation |3) page [TTJ □ 



In order to compute the expected value and the variance of the signed Hult- 
man numbers, we will need the following preliminary lemma. 
Lemma 8.4. Let n > 1 and A a partition of n+ 1 of the form (a, b, yn—a-b+iy 

1. In the case where a > b > 1, we have: 

. (-l) n - a - b 2 a - b (a - 1)!(26 - 2)!(n - a - b + 2)1 



(6-1)! 

(-l)"- a - b+1 (2a - 1)!(6 - l)!(n - a - b + 1)! 

2 a - fc (a- 1)! ' 
F\ ( 1 ) = ^a(I) {2H 2a -i - 2H n -a-b+i - H a _x + H b _i} 



2. In the case where A = (n + 1), we have: 



*(„+!) (0) 


= 2"n!, 


*( n +l)(l) 


(2n+ 1)! 
2 n n! 


4'+l)(l) 


(2n + 1)! 
2 n n! 



{H-in+i — H n /2), 



2 ^ (2k + l) 2 

Proof. We handle both cases separately. 

1. Let us first examine the case where A = (a, b, i™+ 1 - a - fc ) an d a > b > 1. In 
order to simplify the proof, we write F\(x) — x(x — l)h\(x), where h\{x) 
is obtained and defined as follows: 

F\(x) = 2 a -\x/2 + a~ l)—{x + 2b- 2) n+1 - a+b (see defmitioifl pag 
= 2 a - b (x/2 + a- l)^(x + 2b - 2){x + 2b - 1) • • • (s + l)x(x - 1 

x(x - 2){x - 3) • • • (x- 2 + b- n + a) 
= x(x - 1) 2 a - b (a;/2 + a - 1)— (x + 2b- 2)— (s - 2) "~ a ~ b+1 . 



2G 



(a) Using the above notation, we have 

F' x (0) = -h A (0) = (-l)2 a - b (a - 1)^(26 - 2)!(-2) "~ a ~ b+1 , 

from which we easily obtain the wanted expression. 

(b) We also have 

F A (l)=/i A (l) = 2 a - b (a- 1/2)^(26- Xft=l(-l )2=2=*±l 
= 2 a - b (a- 1/2)— (26)!(-l)S=2=5±l > 

and obtaining the formula for F A (l) given in the statement is a simple 
matter, using the fact that 

(a -1/2)^ = (2ffl-l)(2a-3)---(26+l) 

1 (2a -1)! (6-l)!2 b - 1 



2 a - b (a- 1)!2 Q ^ 1 (26-1)! 
(2a- 1)!6! 



?a-6-l 



(a - l)!2 a - b (26)!' 



(c) In order to simplify the computation of the second derivative, we will 
write F\(x) = (x — l)g\{x), where 

gx(x) = 2 a - b (x/2 + a - 1)— (z + 26 - 2)^=i(s - 2)^^+i . 

V v ' V v ' V v ' 

=a\(x) =f} x (x) =y x (x) 

With this notations, it is easy to see that F x (l) = 2g A (l), with 

g' x (l) = a A (l)/3 A (l) 7 A(l) + «a(1)/3 a (1)7a(1) + «a(1)/?a(1) 7 a(1)- 
Note that 

= a x {l){H 2a ^-H 2b -{H a ^-H b )/2}, 

26-1 

/3aW = ^(1)^^=^(1)^26-1, 

n — a— 6+1 



7a(1) = -7a(1) E \ = -7A(l)-ffn- a -6+i, 



fc=l 



7 Recall, as explained in the statement of Theorem l5.ll page 1151 that we must replace n 
with n + 1. 
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and 



"a(1) - ,.,,,,.,„_,,_! 



(2a-l)!6! 
(26)!2 Q - fe - 1 (a- 1)!' 
/3 A (1) = (26-1)!, 
7 a(1) = (-ir- a - b+1 (n- a -6+l)!. 



Combining all of the above, we obtain: 

g' x (l) = a A (l)/3 A (l) 7A (l) 

x{H 2a -i - H 2b - {H a -i - H b )/2 + H 2b -i - ff„_a-6+i} 
(-l) n -°- 6+1 (2a - 1)!(6 - l)!(n - a - 6 + 1)! 
2 Q -<>(a- 1)! 

x {#2a-l — Hn-a-b+1 — {H a -1 — -ff&-i)/2} 

and we finally deduce the formula in the statement. 
2. We now turn to the case where A = (n + 1), i.e. a = n + 1 and 6 = 0. 

(a) Following the definition^] of F\(x) given on page [HI we have 

n 

F (n+1) (x) = 2 n+1 (x/2 + n)2±i = x [] [x + 2k). 

fe=i 

We thus obtain 

n n ^ 

= H(x + 2k) + F {n+1) (x) £ — — , 
fe=i fe=i 

which easily gives the wanted expressions when evaluated at x = 
and i = l. 

(b) For the second derivative, we obtain 

F[> n+1) (x) = F (n+1) (x) Y, ' 



(x + 2z)(x + 2j) : 



hence 



(2n + l)! (/" 1 \ 2 " 1 

(n+l)K ) ^ ^ 2 fc+lj ^(2fc+l) 2 



and the formula in the statement follows. 



□ 



gain, we replace n with n + 1 in the definition. 
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Knowing the generating function G, we can easily obtain the expected value 
of the number of cycles in the breakpoint graph of a random signed permutation 
of n elements. 

Theorem 8.3. The expected value of the number of cycles in the breakpoint 
graph of a uniform random signed permutation of n elements is 

E(c(BG(7r ± ))) = H 2n +i - - J2 r ™( fl ' 6 )> 

(a,b)eA n 

where A n = {(a, &) 6 N 2 : a > b > 1, a + b < n + 1} and 

(-l) n +°- 6 (n + l)(2a - 26 + l)(a - 1)!(26 - 2)!(n - a - 6 + 2)! 



r "( a ' 6 ) 2«- a + b - 1 n!(6-l)!(n + a-& + 2) 2 -(n-a + 6+l)2 

Proof. As recalled in the proof of Theorem l8.ll we have E(c(_BG(7r ± ))) = 
G'(l)/G(l). Note that, by definition, G(l) = S^S^n.i), which equals 



the number of signed permutations of n elements, i.e. 2 n n\. By Lemma 18.31 . the 
expected number of cycles in the breakpoint graph of a random signed permu- 
tation is 

E( C (BG(^))) = J-^ CA (2)^(l)Fj;(0). 



2™n! 

A 



Using the formulas for F' x (\) and F' x (0) derived in Lemma |8.4| and the ex- 



pression for the coefficient^] c\ (2) given in Equations (QiJ and (jll|) pageQTJ the 
formula in the statement follows. □ 

The generating function G allows us also to compute the variance of the 
signed Hultman numbers. 

Theorem 8.4. The variance of the number of cycles in the breakpoint graph of 
a uniform random signed permutation 7r ± of n elements is 



rr " i / 

Var(c(5G( 7 r±)))= if 2n+1 -^-^— — ^- £ r n (a,b) 
+ Yl r n (a,b){2H 2n+1 - H n -2H 2a _ 1 + 2H n -a-b+l + Ha-1 — Hb-1 — 1}, 

(a,b)eA, 

where A n and the coefficients r n (a,b) are as defined in Theorem\8. 



Proof. As recalled in the proof of Theorem |8.2| . the variance can be obtained 
from the generating function G by evaluating the function (log G)'(x) + (log G)"{x) 



3 Again, wc replace n with n + 1 in the definitions. 
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at x = 1. Therefore, the variance of the number of cycles in the breakpoint graph 
of a random signed permutation equals 

G'(l) G"(l) ( G'(l) \ 2 
G(l) + G(l) U(1)J 

= G,(1) G {lf (1) - WBGfr*)))) 2 

= iE^WW + W))^)-^^))))'. (using Lemmail 

A 



Using the formulas for F^(1),F^(1) and F' x (0) given in Lemma l8.4 we obtain 
that the variance equals 



TT 71 1 / TT \ ^ 



*W " -f ~ E (^Tp + (iW - - (E( C (BG (7 r±)))) 2 

- r n (a,6){2 J ff 2a _ 1 -2if n _ _ 6+1 -JI a _ 1 +if i) _ 1 + l} ) 

which equals the wanted exp ression once E(c(BG(7r ± ))) is replaced with the 
value derived in Theorem 18 .31 □ 

As in the unsigned case, we will study the behaviour of the mean and variance 
for large values of n. To that end, we will first prove the following lemma. 
Lemma 8.5. As n — > oo, we have 

V \r n (a,b)\ = — ^- x o(l). 
^-^ login.) 

Proof. If we denote k = a — b, the above sum becomes 

2 fe-»+i( n + l)(2fc + 1) L( "^ 1)/2J (fc + 6- l)!(2fe-2)!(n- A:-2b + 2)! 
^ n!(n + fc + 2)2(n- fc + 1)2 ^ (6-1)! 

^ 2 fc -+ 1 (n + l)(2fc + 1) L(n-W)/2J (fc+ 6-i) 

-1 2fc -n+l K»-W)/2J ^ + ft _ 1 

71-1 nk-n+1 (k+l(n-k+l)/2\\ 



We further observe that 



ofc-rj+l 



, 2«— n+r / 1 \ 1 

(a,b)e.A„ fc=0 V 7 
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and the result in the statement easily follows. 



□ 



Based on this lemma, we can now obtain the following. 

Remark 8.2. When n — ► oo, the expected value and variance of the number of 
cycles in the breakpoint graph of a uniform random signed permutation of n 
elements have the following asymptotical behaviour: 

E(c(BG(tt±))) = ^ + 1+ log(2) + o(l), 
VarMBG^))) = ^ + 1 + log(2) - ^ + o(l). 

Note that, in the limit when n — > oo, the mean and variance in the signed 
case are of the same order (log(n)) as in the unsigned case, but they differ by a 
factor of 1/2. 

9. Applications: Distributions of rearrangement distances 

As stated in the introduction of this paper, the breakpoint graph and its cy- 
cles are used in a lot of variants of genome rearrangement problems to compute 
evolutionary distances - either exactly or approximately. In this section, we are 
interested in exploring to what extent we can rely on those cycles in order to 
approximate the distribution of several distances that have been studied in the 
field of genome rearrangements, so as to obtain a better idea of how tight a 
particular bound on a distance is, or whether it is worth computing a distance 
exactly in cases where this requires solving an NP-hard problem. By "distribu- 
tion of a distance" , we mean the number of (possibly signed) permutations of n 
elements whose distance equals k, for all possible values of k. 

We will not say much about rearrangement distances or how to compute 
them, except for the fact that, as already stated earlier in this paper, they are 
based on a set S of operations that generate S n (resp. S^)- In the following, 
what we mean by expressions like "the S distance of 7r" is the minimum number 
of operations from S needed to transform a given permutation n into the identity 
permutation i; a few examples of such operations that we will consider here 
are summarised informally in Table The reader should bear in mind that 
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Distance 


Operation 


Description of the operation 




bid 


block-interchange 


exchanges two non-necessarily adjacent segments 


-a 
a 


td 


transposition 


exchanges two adjacent segments 


be 

a 


ptd 


prefix transposition 


transposition involving 7Ti , 7T2 , • • • , for some k 




Td 


reversal 


reverses a segment 




prd 


prefix reversal 


reversal involving 7Ti , 7T2 , • ■ • , 71* for some k 


So 


srd 
psrd 


signed reversal 
prefix signed reversal 


reverses a segment and flips the signs in that segment 
signed reversal involving 7Ti , 7T2, . ■ • , 70t for some k 



Table 2: Some abbreviations and informal definitions used throughout this section. 



the discussion presented in this section focuses on experiments with relatively 
small amounts of data (mainly because many interesting distances are hard to 
compute, and because the number of (signed) permutations grows much too fast 
to generate the full distributions for large values of n), which is why we refrain 
from making any bold conjecture or actually proving any result. We will also 
restrict ourselves to comparing distributions for one fixed value of n, namely, 
the largest value for which we could obtain the distribution of the particular 
distance we are interested in; similar-looking plots can however be obtained for 
any value. We generated the distributions based on cycles of the breakpoint 
graph ourselves, but the distributions of the distances we consider here were 
computed by Galvao and Dias 22j. 



9.1. Unsigned distances 

A few distances between unsigned permutations have been considered in the 
field of genome rearrangements [3J . Doignon and Labarre |l[ already observed 
that Sn{n,n + 1 — 2k) is exactly the number of permutations tt in S n whose 
block-interchange distance bid(Tt) equals k, an immediate consequence of the 
following result. 

Theorem 9.1. 0/ For all vr in S n , we have bid(Tr) = (n+ 1 - c(BG(n)))/2. 

Whereas sorting by block-interchanges and computing bid{-n) can be achieved 
in polynomial time [6], this is not the case for any of the other unsigned op- 
erations listed in Table sorting by transpositions and sorting by reversals, 
as well as computing the related distances, are NP-hard problems (see Bulteau 



et al. 



23} and Caprara [24[, respectively); the same problems in the context 
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of prefix reversals are also NP-hard [25|, while their complexity in the case of 
prefix transpositions is open. 

However, since transpositions are but a particular case of block-interchanges, 



the expression given in Theorem |9.1| for computing bid(ir) is also a lower bound 
on the transposition distance td(n) . Additionnally, a tighter lower bound on the 
transposition distance was proved by Bafna and Pevzner [ssj ] . 

Theorem 9.2. [26] For all n in S n> we have td(n) > (n+l-Codd(BG(n)))/2, 
where c dd{BG(ir)) is the number of cycles of odd length in BG(n). 

Consequently, it makes sense to try to approximate the distr ibut ion of the 
transposition distance using <Sff (n, — 2k) (because of Theorem l9.lh and what 
could be called the odd Hultman numbers Sff d (n, n + 1 — 2k), i.e. the number 
of permutations of n elements whose br eakp oint graph contains n + 1 — 2k 
cycles of odd length (because of Theorem 9.2 ). Figure 3(a) compares all three 
distributions for n = 13. To the best of our knowledge, there is no known 
formula for computing odd Hultman numbers. 

Dias and Mcidanis [271 ] initiated the study of prefix transpositions, which are 
transpositions that can only be applied to an initial segment of the permutation 
to sort. To the best of our knowledge, the complexity of sorting by prefix 
transpositions or computing the corresponding distance is still open. However, 
a lower bound on the prefix transposition distance based on the breakpoint 
graph is known. 

Theorem 9.3. ]2aj For any tt in S n , we have 

P td(n) > n + 1+ f G ^ - c l{ BG(n)) - ( ° ^ =1 ' (19) 
y v ' - 2 ( 1 otherwise, v ; 

where ci(BG(tt)) is the number of cycles of length 1 in BG(n). 



Figure l8|(&) shows the distribution of the prefix transposition distance, to- 
gether with some function of the Hultman numbers and the distribution of 
the number of permutations in S n for which lower bound (|19[) equals k for 
n = 13. On this particular plot and the forthcoming ones, we find the off- 
set m in Sn{n, n + 1 — k + m) experimentally by shifting the distribution of 
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Sn{n,n + 1 — k) so that it best fits the distribution of the distance we are 
interested in. 
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Figure 8: (a) How the distributions of the unsigned and odd Hultman numbers relate to the 
distribution of the transposition distance, for n = 13; (b) how the distributions of the unsigned 
Hultman numbers and the number of permutations for which lower bound l|19[ l equals k relate 
to the distribution of the prefix transposition distance, for n = 13. 



Two other distances that have received a considerable amount of attention 
are the reversal distance, where a reversal reverses the order of the elements 
contained in the segment of the permutation on which it acts, and the prefix 
reversal distance, where prefix reversals have the same effect as reversals but may 
only be applied to an initial segment of the permutation. Caprara 



24] showed 



that computing the former is NP-hard, while Bulteau et al. [25| proved that 
computing the latter is NP-hard. Again, we find it interesting to examine how 
the distribution of the number of cycles in the breakpoint graph relates to those 
distances, which we do in Figure[s]. We warn the reader familiar with breakpoint 
graphs, however, that the breakpoint graph used in our paper differs from the 
structure traditionally used for the study of these two distances, which admits 
more than one cycle decomposition; the graph we use can be seen as the result 
of selecting one particular decomposition among all possible decompositions. In 
this setting, there is a much larger difference between the distributions of both 
distances and of the unsigned Hultman numbers than what we have observed 
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for transpositions in Figure |8|, which confirms that using only (our version of) 
the breakpoint graph in this case is not enough. 
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Figure 9: How the distribution of the unsigned Hultman numbers relates to the distribution 
of (a) the reversal distance and (b) the prefix reversal distance, for n = 13. 



9.2. Signed distances 

A number of well-studied and biologically relevant distances between signed 
permutations are also based on the breakpoint graph. These include the double 
cut-and-join (DCJ) distance, introduced by Yancopoulos et al. j^ . who showed 
that its value could be computed using the formula dcj(Tr) = n + 1 — c(BG(n)). 
As a consequence, the number of signed permutations of n elements with DCJ 
distance k is exactly S^(n, n + 1 — k). 

Another distance whose distribution can be well approximated using the 
signed Hultman numbers is the signed reversal distance (see Table Q for an 
informal definition of signed reversals). Hannenhalli and Pevzner [30] proved the 
following formula for computing the signed reversal distance of any permutation 
7r, denoted by srd(ir). 

Theorem 9.4. f3(\] For any ir in S„, the signed reversal distance of ir is 

srd{n) = n + 1 - c{BG(n)) + h(ir) + /(vr), 

where h(ir) is the number of "hurdles" of ir and /(7r) = 1 if 7r is a "fortress", 
and otherwise. 
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We will not give more details on the terms "hurdles" and "fortress" (see 
Hannenhalli and Pevzner [stj] for definitions), except for the fact that hurdles 
are particular collections of cycles in BG(ir), and that a permutation cannot be 
a fortress unless h(w) > 0. Our point here is that the following lower bound, 
first proved by Bafna and Pevzner is extremely tight: 



V 7T e S± : srd(ir) > n + 1 - c(BG(ir)). 



(20) 



This claim is supported by 



Capraraf s proof 



3l| of the fact that the prob- 



ability that a per mutation n € is not 



is 0(n 2 ), and by 



Swenson et al 



s proof 



tight with respect to Equation (j2 



32j that the probability that it is a 



fortress is Q(n 15 ). Therefore, Equation (|20f ) provides a very good approxima- 
tion of the signed reversal distance, and the distribution of S^(n, n + 1 — k) 



closely matches that of the signed reversal distance. Figure 
situation for the case n = 10. 
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Figure 10: The distributions of the signed reversal distance and of the signed Hultman num- 
bers, for n = 10. 



Other distances have not been studied with that level of detail, which is why 
we find it interesting to try to relate their distribution to that of the Hultman 
numbers. A particular restriction of the signed reversal distance is the prefix 
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signed reversal distance, denoted by psrd(-), whose definition follows that of 
the signed reversal distance except that reversals can only act on an initial seg- 
ment of the permutation. No formula is known for computing that distance, 
and the computational complexity of the problem has remained open since the 
first works on the subject [33j. However, a lower bound based on the break- 
point graph was recently obtained by Labarre and Cibulka |34j | , which naturally 
prompts us to wonder how exactly we can rely on the breakpoint graph to 
approximate that distance. 
Theorem 9.5. fs2] For any tt in S^, we have 

P srd(.) >n + l + c(BG(,)) - 2 Cl (BG(n)) - { J *£=^ (21) 

Figure [ll] shows a plot with the distribution of the prefix signed reversal 
distance and that of the signed Hultman numbers, as well as of the distribution 
of lower bound (|2"Tj) for n = 10. It can be seen on that graph that the latter is 
quite far off from the distribution of the prefix signed reversal distance, hinting 
that additional work seems needed to reduce the gap between the lower bound 
and the actual distance. 
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k 

Figure 11: The distributions of the prefix signed reversal distance, of the signed Hultman 
numbers, and of the number of permutations for which lower bound H21II equals k, for n = 10. 
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10. Conclusions 



In this paper, we proved the first explicit formula for enumerating signed 
permutations whose breakpoint graph contains a given number of cycles, and 
proved simpler expressions for particular cases. We also obtained a new expres- 
sion for enumerating unsigned permutations whose breakpoint graph contains a 
given number of cycles, and used both formulas to derive simpler proofs of some 
other previously known results. Getting more insight into breakpoint graphs 
and their cycle decomposition is particularly relevant to edit distances used in 
the field of genome rearrangements, and we hope that our results can help shed 
light on their distributions, expected values and variances. There are several 
interesting directions in which our work could be extended, which we outline 
and motivate below. 

Just like one can define conjugacy classes in the symmetric and hyperoc- 
tahedral groups, we could investigate conjugacy classes with respect to the 
breakpoint graph. This was already initiated by Doignon and Labarre [lj, who 
referred to them as "Hultman classes" and provided explicit formulas for enu- 
merating those classes in the case of unsigned permutations. More work remains 
to be done in the unsigned case: indeed, the work done by Bona and Flynn [5] 
provides us with a very nice formula for computing the distribution of cycles, 
but no simpler expression than the complicated ones obtained by Doignon and 
Labarre [1] is yet known for enumerating Hultman classes or their cardinalities. 
Moreover, no work so far has been done in order to enumerate Hultman classes 
in the signed setting, and obtaining an expression for enumerating the so-called 
"simple permutations" , which are defined in this context as permutations whose 
breakpoint graph contains no cycle of length greater than 2, seems especially 
interesting (for more information about the importance of those permutations 
in genome rearrangements, see Hannenhalli and Pevzner [30| and Labarre and 
Cibulka Q). 

The expression we obtained for the signed Hultman numbers is quite useful 
in practice, since it allows us to obtain the distribution of those numbers for large 



38 



values of n. Unfortunately, it does not seem easy to use in order to gain insights 
and have an intuitive interpretation of the shape of the distribution, which would 
be useful in order to know how this distribution can be approximated or how it 
grows as n increases. Finding simpler generating functions, recurrence relations 
or nicer formulas would be useful in that regard and in order to obtain more 
information on the properties of this distribution. 

The connection between the cycle struc ture of breakpoint graphs and fac- 



torisations of even permutations (Corollary |4.1| . page|H|) proved useful not only 
in characterising the distribution of those cycles and of the related cycle types, 
but also provided the foundations of a simple and generic method for obtaining 
lower bounds on any "revertible" edit distance between unsigned permutations 
(see Labarre [28[ for more details). Is there any way to use the results and 
connections obtained in Section Q in order to obtain similar results for signed 
permutations? 

Finally, recall that permutations are just one way of modelling genomes. One 
natural direction would be to investigate the distribution of cycles in the break- 
point graph of other structures, like set systems or "fragmented" permutations 
(see again Fertin et al. 3| for an overview of existing models). 
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